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Abstract 

We present an approach to decomposition and factor analysis of ma- 
trices with ordinal data. The matrix entries are grades to which objects 
represented by rows satisfy attributes represented by columns, e.g. grades 
to which an image is red, a product has a given feature, or a person per- 
forms well in a test. We assume that the grades form a bounded scale 
equipped with certain aggregation operators and conforms to the struc- 
ture of a complete residuated lattice. We present a greedy approximation 
algorithm for the problem of decomposition of such matrix in a product 
of two matrices with grades under the restriction that the number of fac- 
tors be small. Our algorithm is based on a geometric insight provided by 
a theorem identifying particular rectangular-shaped submatrices as opti- 
mal factors for the decompositions. These factors correspond to formal 
concepts of the input data and allow an easy interpretation of the decom- 
position. We present illustrative examples and experimental evaluation. 

Keywords: factor analysis, ordinal data, fuzzy relation, fuzzy logic, con- 
cept lattice 

1 Introduction 

Problem Description Data dimensionality reduction is fundamental for un- 
derstanding and management of data. In traditional approaches, such as factor 
analysis, a decomposition of an object- variable matrix is sought into an object- 
factor matrix and a factor-variable matrix with the number of factors reason- 
ably small. Compared to the original variables, the factors are considered more 
fundamental concepts, which are hidden in the data. Their discovery and in- 
terpretation, which is central importance in our paper, helps better understand 
the data. 

In this paper, we consider decompositions of matrices / with a particular 
type of ordinal data. Namely, each entry of / represents a grade to which 
the object corresponding to the zth row has, or is incident with, the attribute cor- 
responding to the jth row. Examples of such data are results of questionnaires 
where respondents (rows) rate services, products, etc., according to various cri- 
teria (columns); results of performance evaluation of people (rows) by various 
tests (columns); or binary data in which case there are only two grades, (no, 



failure) and 1 (yes, success). Our goal is to decompose annXm object-attribute 
matrix / into a product 

I = AoB (1) 

of an n x k object-factor matrix A and a k x m factor-attribute matrix B with 
the number k of factors as small as possible. 

The scenario is thus similar to ordinary matrix decomposition problems but 
there are important differences. First, we assume that the entries of /, i.e. the 
grades, as well as the entries of A and B are taken from a bounded scale L of 
grades, such as the real unit interval L = [0, 1] or the Likert scale L = {1, . . . , 5} 
of degrees of satisfaction. Second, the matrix composition operation o used in 
our decompositions is not the usual matrix product. Instead, we use the t-norm- 
based product with a t-norm ® being a function used for aggregation of grades. 
In particular, A o B is defined by 

(A o B)ij = Vf=i Mi ® Bij. (2) 

where V denoted the supremum (maximum, if L is linearly ordered) . The ordi- 
nary Boolean matrix product is a particular case of this product in which the 
scale L has and 1 as the only grades and a ® b = min(a, b). Also, when A and 
B are thought of as fuzzy relations, A o B is exactly the usual composition of 
fuzzy relations, see e.g. [TJJ [T3] . It is to be emphasized that we attempt to treat 
graded incidence data in a way which is compatible with its semantics. This 
need has been recognized long ago in mathematical psychology, in particular 
in measurement theory |15j . For example, even if we represent the grades by 
numbers such as ~ strongly disagree, 4 ~ disagree, . . . , 1 ~ strongly agree, 
addition, multiplication by real numbers, and linear combination of graded in- 
cidence data may not have natural meaning. Consequently, decomposition of a 
matrix / with grades into the ordinary matrix product of arbitrary real-valued 
matrices A and B suffers from a difficulty to interpret A and B, as well as to 
interpret the way / is reconstructed from, or explained by, A and B. This is 
not to say that the usual matrix decompositions of incidence data / may not be 
useful. For example, [191128) report that decompositions of binary matrices into 
real-valued matrices may yield better reconstruction accuracies. Hence, as far 
as the dimensionality reduction aspect (the technical aspect) is concerned, ordi- 
nary decompositions may be favorable. However, when the knowledge discovery 
aspect plays a role, attention needs to be paid to the semantics of decomposi- 
tion. Our algorithm is based on [3], in particular on using formal concepts of / 
as factors. This is important both from the technical viewpoint, since due to @] 
optimal decompositions may be obtained this way, and the knowledge discovery 
viewpoint, since formal concepts may naturally be interpreted. 

Related Work Recently, new methods of matrix decomposition and dimen- 
sionality reduction have been developed. One aim is to have methods which 
are capable of discovering possibly non-linear relationships between the original 
space and the lower dimensional space 23, 29J. Another is driven by the need to 
take into account constraints imposed by the semantics of the data. Examples 
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include nonnegative matrix factorization, in which the matrices are constrained 
to those with nonnegative entries and which leads to additive parts-based dis- 
covery of features in data |16j . Another example, relevant to this paper, is 
Boolean matrix decomposition. Early work on this problem was done in |221I26) 
which already include complexity results showing the hardness of problems re- 
lated to Boolean matrix decompositions. Recent work on this topic includes 
[3 HH1 US [25]. As was mentioned above, Boolean matrix decomposition is 
a particular case of the problem considered in this paper. In particular, the 
present approach is inspired by [5]. 

Note also that partly related to this paper are methods for decomposition 
of binary matrices into non-binary ones such as [IT] [2H [35] [27] [33J > see also [25] 
for further references. 

2 Decomposition and Factors 

2.1 Decomposition and the Factor Model 

As was mentioned above, we assume that for the problem of finding a decompo- 
sition ([!]) of I with the matrix product defined by ([2| , the set L of grades forms 
a bounded scale equipped with an aggregation operation ®. In particular, we 
assume that L is a complete lattice bounded by and 1 and that ® is a binary 
operation on L that is commutative, associative, has 1 as its neutral element, 
and commutes with suprema, i.e. 

a ®\lkeK b k = \lkeK( a ® h k)- 

Note that this in particular implies a ® 1 = a. It is well-known, see e.g. [11] [T2l 
114) . that for any such operation, one may define its residuum — > by 

a — > b = max{c e L | a <g) c < b}. 

The residuum satisfies an important technical condition called adjointness, namely, 

a (g) b < c iff a < b — >• c. 

L together with <g) and — > satisfying the above conditions forms a complete 
residuated lattice [3T] . 

Complete residuated lattices are well known in fuzzy logic where are used 
as the structures of truth degrees with (g> and — > being the truth functions of 
(many-valued) conjunction and implication. Important examples include those 
with L = [0, 1] and ® being a continuous t-norm, such as a ® b = min(a, b) 
(Godel t-norm), a ® b — a ■ b (Goguen t-norm), and a ® b = max(0, a + b — 1) 
(Lukasiewicz t-norm); or L being a finite chain equipped with the restriction 
of Godel t-norm, Lukasiewicz t-norm, or other suitable operation. Since these 
matters are routinely known, we omit details and refer the reader for further 
examples and properties of residuated lattices to [TT] [T2J [T3] . 
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Consider now the meaning of the factor model given by and <§. The ma- 
trices A and B represent relationships between objects and factors, and between 
factors and the original attributes. We interpret An as the degree to which the 
factor I applies to the object i, i.e. the truth degree of the proposition "factor I 
applies to object i" ; and B[j as the degree to which the attribute j is a particular 
manifestation of the factor I, i.e. the truth degree of the proposition "attribute 
j is a manifestation of factor V . Therefore, due to basic principles of fuzzy logic, 
if I = A o B, the discovered factors explain the original relationship between 
objects and attributes, represented by /, via A and B as follows: the degree 1^ 
to which the object i has the attribute j equals the degree of the proposition 
"there exists factor I such that I applies to i and j is a particular manifestation 

of r. 

As the nature of the relationship between objects and attributes via factors 
is traditionally of interest, it is worth noting that in our case, the attributes are 
expressed by means of factors in a non-linear manner: 

Example 1. With Lukasiewicz t-norm, let / = A o B be 




Then for Q 1 = (0.6 0.2) and Q 2 = (0.4 0.3) we have (Q 1 + Q 2 )oB = (1.0 0.5) o 
B = (0.4 0.8 0.6) ^ (0.0 0.6 0.2) = (0.0 0.4 0.2) + (0.0 0.2 0.0) = Q 1 oB + Q 2 oB. 

2.2 Factors for Decomposition 

We now need to recall a result from [3] saying that optimal decompositions of 
I may be attained by using formal concepts of I as factors. Denote by L u the 
set of all fuzzy sets in a set U with truth degrees from L, i.e. the set of all 
mappings from U to L, and put X = {1, . . . ,n} (objects) and Y = {1, . . . , m} 
(attributes) . 

A formal concept of I is any pair (C, D) of fuzzy sets C € L x and D e L Y for 
which C'=D and £H = C where the operators ' : L x —> L Y and ^ : L Y —> L x 
are defined by 

C t (j) = A ieX (C(i)^I ij ) and DHi) = A jeY (D(j) ->• Iij)- 

Here, /\ is the infimum in L (in our case, since X and Y are finite, infimum 
coincides with minimum if L is linearly ordered). The set 

B{X, Y, I) = {{C, D) e L x x L Y | C 4 = D and D x = C] 

of all formal concepts of I is called the concept lattice of / and forms indeed a 
complete lattice when equipped with a natural subconcept-superconcept order- 
ing, see [3] for details. Formal concepts are simple models of concepts in the 
sense of traditional, Port-Royal logic. For a formal concept (C,D), C and D 
are called the extent and the intent of (C,D); the degrees C(i) and D(j) are 
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interpreted as the degrees to which the concept applies to object i and attribute 
j, respectively. The graded setting takes into account that most concepts used 
by humans are graded rather than clear-cut. 

For a set T = {(Ci,Di), . . . , (Ck,D k )} of formal concepts of I with a fixed 
order given by the indices, denote by Ajr and Bjr the n x k and k x m matrices 
defined by 

(Aj^i, = (C,)(i) and (BjO« = (A)CJ')- 

That is, the Ith column of Ajr consists of grades assigned to the objects by Ci 
and the Ith row of Bjr consists of grades assigned to attributes by Di . 

If I = Aj- o Bjr, T can be seen as a set of factors which fully explain the 
data. In such a case, we call the formal concepts from J- factor concepts. In this 
case, the factors have a natural, easy-to-understand meaning as is demonstrated 
in Section |4j Let p{I) denote the Schein rank of I, i.e. 

p{I) = min{fc | I = A o B for some n x k and k x m matrices A and B}. 

The following theorem was proven in [¥]. 

Theorem 1. For every n x m matrix I with entries from L there exists a set 
J- C B{X, Y, I) containing exactly p(I) formal concepts for which I = Ajr o Bjr. 

The theorem says that, in a sense, formal concepts of I are optimal factors 
for decompositions. It follows that when looking for decompositions of /, one 
can restrict the search to the set of formal concepts instead of the set of all 
possible decompositions. 



3 Algorithm and Complexity of Decompositions 

To prevent misunderstanding, let us define our problem precisely. For a given 
(that is, constant for the problem) structure of truth degrees, i.e. set L equipped 
with the lattice operations and ® and — the problem we discuss is a minimiza- 
tion (optimization) problem [I] specified as follows: 

Input: n x m matrix / with entries from L\ 

Feasible Solution: n x k and k x m matrices A and B with entries 

from L for which I = Ao B; 
Cost of Solution: k. 

As indicated above, due to Theorem [TJ we look for feasible solutions A and 
B in the form Ajr and Bjr for some JF . Therefore, the algorithm we present in 



Section 3.1 computes a set J- of formal concepts of I for which Ajr and Bjr is 



a good feasible solution. Our algorithm runs in polynomial time but produces 



only suboptimal solutions, i.e. \F\ > p{I). As is shown in Section 3.2 this 
is a consequence of a fundamental limitation. Namely, unless P=NP, there 
does not exist a polynomial time algorithm producing optimal solutions to the 
decomposition problem. We demonstrate experimentally in Section |4j however, 
that the quality of the solutions provided by our algorithm is reasonable. 
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In this section as well as in Section [4] we need the following "geometric" 
insight. Let us note that every formal concept (Ci,Di) £ T induces a matrix 
Ji=Ci® Di given by 

(Ci ® Di)ij = Q(i) ® AO')) (3) 

the rectangular matrix induced by (C/, A) (it results by the Cartesian product 
of Ci and A)- Then / = Aj- o Bj- means that 

Iij = (Ji)ijV-~V(Jk)ij, (4) 
i.e. / is the V _su P er P os ition of J/s. 

3.1 Algorithm 

Throughout this section, we assume that L is linearly ordered, i.e. a < b or 
b < a for any two degrees a,d£l. (The general, non-linear case can be handled 
with no substantial difficulty but we prefer to keep things simple, particularly 
because of the practical importance of the linear case.) In such case, Q implies 
that / = Ajr o Bjr if and only if for each € {1, . . . , n} x {1, . . . , m} there 
exists (Ci, A) S J 7 for which 

Iij = C;(i) ® AO')- (5) 

In case of ([5]), we say that (C;, A) covers This allows us to see that the 

problem of finding a set J- of formal concepts of / for which / = Ajr o Bjr can 
be reformulated as the problem of finding J- such that every pair from the set 

U = {{i,j)\I iJ ?0} (6) 

is covered by some (C;, A) £ J 7 . Since Ci(i)® AO) — A ^ s always the case [2], 
we need not worry about overcovering. We now see that every instance of our 
decomposition problem may be rephrased as an instance of the well-known set 
cover problem, see e.g. [H |S] in which the set to be covered is U and the system 
of sets that may be used to cover U is 

; Iij < C(i)®D(j)} | (C,D)eB(X,Y,I)}. 

Accordingly, one can use the well-known greedy approximation algorithm [T] for 
solving set cover to select a set T for formal concepts for which / = Ajr o Bj-. 
However, this would be a costly way from the computational complexity point 
of view. Namely, one would need to compute the possibly rather large set 
B(X, Y, I) first and, worse, repeatedly iterate over this set in the greedy set 
cover algorithm. 

Instead, we propose a different greedy algorithm. The idea is to supply 
promising candidate factor concepts on demand during the factorization proce- 
dure, as opposed to computing all candidate factor concepts beforehand. The 
algorithm generates the promising candidate factor concepts by looking for 
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promising columns. A technical property which we utilize is the fact that for 
each formal concept (C, D) , 

D = \j j M D(j) /j} it , 

i.e. each intent D is a union of intents { D ^/j}^ [3] and that C = by 
definition. Here, { D ^/j} denotes a graded singleton, i.e. 



D(j) itj' = j, 
if f 7^ 3- 



As a consequence, we may construct any formal concept by adding sequentially 
{ a /j}^ to the empty set of attributes. Our algorithm follows a greedy approach 
that makes us select j £ Y and a degree a £ L which maximize the size of 

D ® a j = {(fc, l)EU\ D+Hk) ® D+^{1) > I kl }, (7) 

where D + = D U { a /j} and U denotes the set of of / (row i, column j) for 
which the corresponding entry Iij is not covered yet. At the start, U is initialized 
according to Q. As the algorithm proceeds, U gets updated by removing from 
it the pairs (i, j) which have been covered by the selected formal concept (C, D). 
Note that \D © j\ is the number of entries of I which are covered by formal 
concept (D + ^,D + ^), i.e. by the concept generated by D + , the intent of the 
current candidate concept (C, D) extended by { a /j}- Therefore, instead of going 
through all possible formal concepts and selecting the factors from them, we just 
go through columns and degrees and add them repeatedly as to maximize the 
value V of the corresponding formal concepts, until such addition is possible. 
The resulting algorithm is summarized below. 

Find-Factors (7) 

1 U^{{i,j)\I ij ^Q) 

2 F<- 

3 while 

4 do D <- 

5 V<-0 

6 select (j,a) that maximizes \D ® a j\ 

7 while \D® a j\ > V 

8 do V^\D® a j\ 

9 D <-(£> U { a /j})^ 

10 select (j,a) that maximizes \D® a j\ 

11 C 4r- 

12 T 4r- J"U {(C,D)} 

13 for (i,j)eU 

14 do if 1^ < C(i) (g) D(j) 

15 then 

16 U^U\{(i,j)} 

17 return T 
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The main loop of the algorithm (lines 3-16) is executed until all the nonzero 
entries of I are covered by at least one factor in T . The code between lines 4 
and 10 constructs an intent by adding the most promising columns. After such 
an intent D is found, we construct the corresponding factor concept and add it 
to T . The loop between lines 13 and 16 ensures that all matrix entries covered 
by the last factor are removed from U. Obviously, the algorithm is sound and 
finishes after finitely many steps (polynomial in terms of n and m) with a set 
T of factor concepts. 

3.2 Complexity of Finding Optimal Decompositions 

As mentioned above, there is no guarantee that our algorithm finds an optimal 
decomposition, i.e. the one with k = p(I). The following theorem shows that, 
unless P=NP, no polynomial time algorithm which finds optimal decompositions 
exists. 

Theorem 2. The decomposition problem, i.e. the problem to find for a given 
n x m matrix I with grades an n x k matrix A and a k x m matrix B for which 
I = A o B with k as small as possible, is NP-hard. 

Proof. The theorem is an easy consequence of established reductions, see [2T | [22 ] 
and also [SJ [THl [30]. Namely, by definition of NP-hardness of optimization 
problems, we need to show that the corresponding decision problem is NP- 
complctc. The decision problem, which we denote by II in what follows, is to 
decide for a given / and positive integer k whether there exists a decomposition 
I = A o B with the inner dimension k or smaller. Now, II is NP-complete 
because the decision version of the set basis problem, which is known to be NP- 
complete |2B], is reducible to it. The decision version of the set basis problem 
is: Given a collection S = {Si, . . . , S n } of sets Si C {1, . . . , m} and a positive 
integer k, is there a collection P = {Pi, . . . , P k } of subsets Pi C {1, . . . , m} such 
that for every Si there is a subset Qi C {Pi, . . . , P k } for which \JQi = Si (i.e., 
the union of all sets from Qi is equal to 5,)? This problem is easily reducible to 
II: Given S, define annxm matrix / by £y = 1 if j G Si and Zy = if j ' G' Si. 
Such reduction works for every L and <g> because we always have 1 ® 1 = 1 and 
l<X)0 = 0<X>l = 0<g>0 = 0. Namely, one can check that if I = A o B for n x k 
and k x m matrices A and B with entries from L then p (I — 1, ... ,k) and Qi, 
defined by j 6 p if Bij = 1 and p e Qi if An = 1, represent a solution to the 
set basis problem given by S. Conversely, if p and Qi represent a solution to 
the set basis problem, the matrices A and B defined by B^ = 1 if j G Pj and 
Bij = Q if j g 1 P, and An = 1 if P/ G Qi and An — if p ^ Qi, are matrices 
with entries from L which represent a solution to II. □ 

4 Examples and Experiments 

In Section [4~T| we examine in detail a factor analysis of 2004 Olympic Decathlon 
data. We include this example to illustrate the notions involved in our methods 
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Table 1: 2004 Olympic Games decathlon 



Scores of Top 5 Athletes 





10 


lj 


sp 


hj 


40 


11 


di 


pv 


ja 


15 


Sebrle 


894 


1020 


873 


915 


892 


968 


844 


910 


897 


680 


Clay 


989 


1050 


804 


859 


852 


958 


873 


880 


885 


668 


Karpov 


975 


1012 


847 


887 


968 


978 


905 


790 


671 


692 


Macey 


885 


927 


835 


944 


863 


903 


836 


731 


715 


775 


Warners 


947 


995 


758 


776 


911 


973 


741 


880 


669 


693 



Incidence Data Table with Graded Attributes 





10 


lj 


sp 


hj 


40 


11 


di 


pv 


ja 


15 


Sebrle 


0.50 


1.00 


1.00 


1.00 


0.75 


1.00 


0.75 


0.75 


1.00 


0.75 


Clay 


1.00 


1.00 


0.75 


0.75 


0.50 


1.00 


0.75 


0.50 


1.00 


0.50 


Karpov 


1.00 


1.00 


0.75 


0.75 


1.00 


1.00 


1.00 


0.25 


0.25 


0.75 


Macey 


0.50 


0.50 


0.75 


1.00 


0.75 


0.50 


0.75 


0.25 


0.50 


1.00 


Warners 


0.75 


0.75 


0.50 


0.50 


0.75 


1.00 


0.25 


0.50 


0.25 


0.75 



Legend: 10 — 100 meters sprint race; lj — long jump; sp — shot put; hj — high jump; 
40 — 400 meters sprint race; 11 — 110 meters hurdles; di — discus throw; pv — pole 
vault; ja — javelin throw; 15 — 1500 meters run. 



but most importantly to argue that the algorithm developed in this paper can 
be used to obtain reasonable factors from data with grades. In Section |4.2| we 
present results of an experimental evaluation of our algorithm. 

4.1 Decathlon data 

Grades of ordinal scales are conveniently represented by numbers, such as the 
Likert scale {1, . . . , 5}. In such a case we assume these numbers are normalized 
and taken from the unit interval [0, 1]. As an example, the Likert scale is repre- 
sented by L — {0, |, |, |, 1}. Due to the well-known Miller's 7± 2 phenomenon 
[2"U] . one might argue that we should restrict ourselves to small scales. 

In this section, we explore factors explaining the athletes' performance in 
the event. Tab. [I] (top) contains the results of top five athletes in 2004 Olympic 
Games decathlon in points which are obtained using the IAAF Scoring Tables 
for Combined Events. Note that the IAAF Scoring Tables provide us with an 
ordinal scale and a ranking function assigning the scale values to athletes. We 
are going to look at whether this data can be explained using formal concepts 
as factors. 

We first transform the data from[T](top) to a five-element scale 

L = {0.00, 0.25, 0.50, 0.75, 1.00} (8) 
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Table 2: Lowest and highest scores in the 2004 Olympic Games decathlon 





10 


lj 


sp 


hj 


40 


11 


di 


pv 


ja 


15 


lowest 


782 


723 


672 


670 


673 


803 


661 


673 


598 


466 


highest 


989 


1050 


873 


944 


968 


978 


905 


1035 


897 


791 



by a natural transformation and rounding. Namely, for each of the disciplines, 
we first take the lower and highest scores achieved among all athletes who have 
finished the decathlon event, see Table [2] Then, for each discipline, we make a 
linear transform of values from the [min, max] interval to the real unit interval. 
For instance, in case of lj (long jump), we consider the function 

x - 723 x - 723 _ x - 723 

* li ^ X ' = (1050 - 723) ~~ (1050 - 723) ~~ 327 ^' 

ana analogously for the other disciplines, cf. Table[2] Finally, for each athlete we 
compute the value of functions like ([9| and round the results to the closest value 
from the discrete scale (JsJ) . That is, instead of working with numerical values as 
m Table [l](top), we use the graded dataset in Table [l](bottom) which describes 
the athletes' performance using the five-element scale where the table entries 
are degrees to which athletes achieve high scores for particular disciplines (with 
respect to the other athletes participating in the event). As a consequence, the 
factors then have a simple reading. Namely, the grades to which a factor applies 
to an athlete can be described in natural language as "not at all" , "little bit" , 
"half", "quite", "fully", or the like. 

Using shades of gray to represent grades from the five-element scale L, the 
matrix / corresponding to Tab.[T](bottom) can be visualized in the following 
array (rows correspond to athletes, columns correspond to disciplines, the darker 
the array entry, the higher the score): 



The algorithm described in Section [3T] found a set J 7 of 7 formal concepts 
which factorize /, i.e. for which / = Ajr o Bjr (note that in this example, we 
have used the Lukasiewicz t-norm on L). These factor concepts are shown in 
Table[3]in the order in which they were produced by the algorithm. In addition, 
Fig. [I] shows the corresponding rectangular matrices, cf. ^ . 

For example, factor concept Fi applies to Sebrle to degree 0.5, to both Clay 
and Karpov to degree 1, to Macey to degree 0.5, and to Warners to degree 0.75. 
Furthermore, this factor concept applies to attribute 10 (100 m) to degree 1, to 
attribute lj (long jump) to degree 1, to attribute sp (shot put) to degree 0.75, 
etc. This means that an excellent performance (degree 1) in 100 m, an excellent 
performance in long jump, a very good performance (degree 0.75) in shot put, 
etc. are particular manifestations of this factor concept. On the other hand, 
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Table 3: Factor concepts 



Fi 


Extent 


Intent 


Fi 
F 2 
F 3 
F 4 
F 5 
F B 
F- 


{■ 5 /S,C,K,- 5 /M,- 75 /W} 

{s,- 75 /c,- 2 7k,-7m,- 25 /w} 
{• 75 /s,- 5 /c,- 7 7k,m,-7w} 
{s,- 7 7c, - 7 7k, 7m,w} 
{■ 7 7s,- 7 7c,k,- 7 7m,- 2 7w} 
{■ 7 7s,-7c,k,- 7 7m,- 7 7w} 
{s, c, - 2 7k, 7m, ,25 /w} 


{10, lj, " 7 7sp, - 7 7hj, 740, 11, 7di, - 2 7pv, 2 7ja, 715} 

{ 710, lj, sp, hj, 740, 11, - 7 7di, - 7 7pv, ja, - 7 7l5} 

{ '710, 7«, -"/sp.hj, 7 740, 711, - 7 7di, 2 7pv, Via, 15} 

{•710, - 7 7ij, 5 / s P' 7hj, 740, 11, - 2 7di, 7pv, - 25 /ja, 715} 
{■ 7 7io, - 7 7ij, T 7sp, - 7 7hj, - 7 74o, -711, di, - 2 7 P v, - 2 7ja, - 7 7i5} 
{■ 7 7io, - 7 7ij, "/sp, - 7 7hj, 40, -711, 7di, - 2 7 P v, - 2 7j a , - 7 7i5} 
{■710, lj, ' 7 7s P , - 7 7hj, -740, 11, - 7 7di, r y P v, ja, 7i5}> 



I 



F4 F 5 F 6 F 7 

Figure 1: Factor concepts as rectangular patterns 



only a relatively weak performance (degree 0.25) in javelin throw and pole vault 
are manifestations of this factor. 

Therefore, a decomposition / = Ajr o Bjr exists with 7 factors where: 



.4 



'0.50 1.00 0.75 1.00 0.75 0.75 1.00' 

1.00 0.75 0.50 0.75 0.75 0.50 1.00 

1.00 0.25 0.75 0.75 1.00 1.00 0.25 

0.50 0.50 1.00 0.50 0.75 0.75 0.50 

0.75 0.25 0.50 1.00 0.25 0.75 0.25, 



/ 1.00 1.00 0.75 0.75 0.50 1.00 0.50 0.25 0.25 0.50 \ 

/ 0.50 1.00 1.00 1.00 0.75 1.00 0.75 0.75 1.00 0.75 \ 

0.50 0.50 0.75 1.00 0.75 0.50 0.75 0.25 0.50 1.00 

0.50 0.75 0.50 0.50 0.75 1.00 0.25 0.50 0.25 0.75 

0.75 0.75 0.75 0.75 0.75 0.75 1.00 0.25 0.25 0.75 

0.75 0.75 0.75 0.75 1.00 0.75 0.50 0.25 0.25 0.75 

\0.50 1.00 0.75 0.75 0.50 1.00 0.75 0.50 1.00 0.50 } 

Again, using shades of gray, this decomposition can be depicted as: 



Fig. [2] demonstrates what portion of the data matrix / is explained using 
just some of the factor concepts from T . The first matrix labeled by 46% 
shows Ajr 1 o Bjr 1 for T\ consisting of the first factor F\ only. That is, the 
matrix is just the rectangular pattern corresponding to Fi, cf. Fig.[l] As we 
can see, this matrix is contained in /, i.e. approximates / from below, in that 
(Ajr x o Bjr^ij < Iij for all entries (row i, column j). Label 46% indicates that 
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S:lh:= B=ll=:= E'll:: 1 

46% 72% 84% 

-=H mtf^ lHmffMFH 

92% 96% 98% 100% 

Figure 2: V _su P er P os ition of factor concepts 

46% of the entries of Ajr x o Bjr 1 and I are equal. In this sense, the first factor 
explains 46% of the data. Note however, that several of the 54% = 100% — 46% 
of the other entries of Ajr x o Bjr 1 are close to the corresponding entries of /, so 
a measure of closeness of Aj? t o Bjr 1 and I which takes into account also close 
entries, rather than exactly equal ones only, would yield a number larger than 
46%. 

The second matrix in Fig. [2] with label 72%, shows Ajr 2 o Bjr 2 for T<x con- 
sisting of F\ and Fi. That is, the matrix demonstrates what portion of the data 
matrix / is explained by the first two factors. Again, Ajr 2 o Bjr 2 approximates 
I from below and 72% of the entries of Ajr 2 o Bjr 2 and / coincide now. Note 
again that even for the remaining 28% of entries, Ajr 2 o Bjr 2 provides a reason- 
able approximation of /, as can be seen by comparing the matrices representing 
Ajr 2 o Bjr 2 and /, i.e. the one labeled by 72% and the one labelled by 100%. 

Similarly, the matrices labeled by 84%, 92%, 96%, 98%, and 100% represent 
Ajr l oB^ for I — 3,4,5,6,7, for sets Ti of factor concepts consisting of Fi, . . . , i*}. 
We can conclude from the visual inspection of the matrices that already the two 
or three first factors explain the data reasonably well. 

Let us now focus on the interpretation of the factors. Fig. [I] is helpful as 
it shows the clusters corresponding to the factor concepts which draw together 
the athletes and their performances in the events. 

Factor Fx: Manifestations of this factor with grade 1 are 100 m, long jump, 
110 m hurdles. This factor can be interpreted as the ability to run fast for short 
distances (speed). Note that this factor applies particularly to Clay and Karpov 
which is well known in the world of decathlon. Factor F^: Manifestations of 
this factor with grade 1 are long jump, shot put, high jump, 110 m hurdles, 
javelin. Fi can be interpreted as the ability to apply very high force in a very 
short term (explosiveness) . F^ applies particularly to Sebrle, and then to Clay, 
who are known for this ability. Factor F3 : Manifestations with grade 1 are high 
jump and 1500 m. This factor is typical for lighter, not very muscular athletes 
(too much muscles prevent jumping high and running long distances). Macey, 
who is evidently that type among decathletes (196 cm and 98 kg) is the athlete 
to whom the factor applies to degree 1. These are the most important factors 
behind data matrix /. 
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Table 4: Exact factorizability 





Lukasiewicz <g) 


minimum £g> 


k 


no. of factors 


no. of factors 


5 


5.205 ±0.460 


6.202 ±1.037 


7 


7.717 ±0.878 


10.050 ±1.444 


9 


10.644± 1.316 


13.379 ±1.676 


11 


13.640± 1.615 


15.698 ±1.753 


13 


16.423 ± 1.879 


17.477 ±1.787 


15 


18.601 ±2.016 


18.721 ±1.863 



4.2 Experimental Evaluation 

We now present experiments with exact and approximate factorization of se- 
lected publicly-available datasets and randomly generated matrices and their 
evaluation. First, we observed how close is the number of factors found by the 
algorithm FindFactors to a known number of factors in artificially created 
matrices. In this experiment, we were generating 20 x 20 matrices according to 
various distributions of 5 grades. These matrices were generated by multiplying 
m x k and k x n matrices. Therefore, the resulting matrices were factorizablc 
with at most k factors. Then, we executed the algorithm to find T and observed 
how close is the number \F\ of factors to k. The results are depicted in Tab.|4j 
We have observed that in the average case, the choice of a t-norm is not essen- 
tial and all t-norms give approximately the same results. In particular, Tab. [I] 
describes results for Lukasiewicz and minimum t-norms. Rows of Tab.U corre- 
spond to numbers k — 5, 7, . . . , 15 denoting the known number of factors. For 
each fc, we computed the average number of factors produced by our algorithm 
in 2000 fc-factorizable matrices. The average values arc written in the form of 
"average number of factors ± standard deviation" . 

As mentioned above, factorization and factor analysis of binary data is a 
special case of our setting with L = {0, 1}, i.e. with the scale containing just 
two grades. Then, the matrix product o given by ^ coincides with the Boolean 
matrix multiplication and the problem of decomposition of graded matrices 
coincides with the problem of decomposition of binary matrices into the Boolean 
product of binary matrices. We performed experiments with our algorithm in 
this particular case with three large binary data sets (binary matrices) from the 
Frequent Itemset Mining Dataset RepositorjQ In particular, we considered the 
CHESS (3196 x 75 binary matrix), CONNECT (67557 x 129 binary matrix), and 
MUSHROOM (8124 x 119 binary matrix) data sets. The results are shown in 
Fig.[3j The x-axes correspond to the number of factors (from 1 up to 50 factors 
were observed) and the y-axes are percentages of data explained by the factors. 
For example, we can see that the first 10 factors of CHESS explain more than 
70% of the data, i.e. Ajr o Bj- covers more than 70% of the nonzero entries 

1 http : / /f imi . cs .helsinki . f i/data/ 
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Figure 3: Approximate factorization of Boolean matrices by first 50 factors 



90% 
70% 
50% 
30% 
10% 




10 20 30 40 



Figure 4: Factorization of graded incidence matrix FOREST FIRES 



of CHESS for {J 7 ] = 10. In all the three cases, we can see a tendency that a 
relatively small number of factors (compared to the number of attributes in the 
datasets) cover a significant part of the data. 

A similar tendency can also be observed for graded incidence data. For 
instance, we have utilized the algorithm in factor analysis of the FOREST 
FIRES [7] dataset from the UCI Machine Learning Repository In its original 
form, the dataset contains real values. It has been therefore transformed into a 
graded incidence matrix representing relationship between spatial coordinates 
within the Montesinho park map (rows) and 50 different groups of environmen- 
tal and climate conditions (columns). The matrix entries are degrees (coming 
from an equidistant Lukasiewicz chain L = { yj^ | n is integer, < n < 100}) 
to which there has been a large area of burnt forest in the sector of the map 
under the environmental conditions. Factor analysis of data in this form can 
help reveal factors which contribute to forests burns in the park. The exact 
factorization has revealed 46 factors which explain 50 attributes. As in case of 
the Boolean datasets, relatively small number of factors explain large portions 
of the data. For instance, more than 50% of the data is covered by 10 factors, 
more than 80% of the data is covered by 23 factors, see Fig. [4] 



http : //archive . ics .uci . edu/ml/ 
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5 Conclusions 



We presented a novel approach to decomposition and factor analysis of matri- 
ces with grades, i.e. of a particular form of ordinal data. The factors in this 
approach correspond to formal concepts in the data matrix. The approach is 
justified by a theorem according to which optimal decompositions are attained 
by using formal concepts as factors. The relationship between the factors and 
original attributes is a non-linear one. An advantageous feature of the model is 
a transparent way of treating the grades which results in good interpretability 
of factors. We observed that the decomposition problem is NP-hard as an opti- 
mization problem. We proposed a greedy algorithm for computing suboptimal 
decompositions and provided results of experiments demonstrating its behavior. 
Furthermore, we presented a detailed example of factor discovery which demon- 
strates that the method yields interesting factors from data. Since the method 
developed naturally allows for a linguistic interpretation of factors, it may be 
considered as a step toward what might be regarded a linguistic factor analysis 
of qualitative data. 

Future research will include the following topics. First, a comparison, both 
theoretical and experimental, to other methods of matrix decompositions, in 
particular to the methods emphasizing good interpretability, such as non-negative 
matrix factorization |16j . Second, an investigation of approximate decomposi- 
tions of /, i.e. decompositions to A and B for which AoB is approximately equal 
to I with respect to a reasonable notion of approximate equality. Third, develop- 
ment of further theoretical insight focusing particularly on reducing further the 
space of factors to which the search for factors can be restricted. Fourth, study 
the computational complexity aspects of the problem of approximate factoriza- 
tion, in particular the approximability of the problem of finding decompositions 
of matrix I pQ. Fifth, explore further the applications of the decompositions 
studied in this paper, particularly in areas such as psychology, sports data, or 
customer surveys, where ordinal data is abundant. 
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